Late multimodal fusion for image and audio music transcription
نویسندگان
چکیده
Music transcription, which deals with the conversion of music sources into a structured digital format, is key problem for Information Retrieval (MIR). When addressing this challenge in computational terms, MIR community follows two lines research: documents, case Optical Recognition (OMR), or audio recordings, Automatic Transcription (AMT). The different nature aforementioned input data has conditioned these fields to develop modality-specific frameworks. However, their recent definition terms sequence labeling tasks leads common output representation, enables research on combined paradigm. In respect, multimodal image and transcription comprises effectively combining information conveyed by modalities. work, we explore question at late-fusion level: study four combination approaches order merge, first time, hypotheses regarding end-to-end OMR AMT systems lattice-based search space. results obtained series performance scenarios–in corresponding single-modality models yield error rates–showed interesting benefits approaches. addition, strategies considered significantly improve unimodal standard recognition
منابع مشابه
Multimodal medical image fusion based on Yager’s intuitionistic fuzzy sets
The objective of image fusion for medical images is to combine multiple images obtained from various sources into a single image suitable for better diagnosis. Most of the state-of-the-art image fusing technique is based on nonfuzzy sets, and the fused image so obtained lags with complementary information. Intuitionistic fuzzy sets (IFS) are determined to be more suitable for civilian, and medi...
متن کاملAutomatic Music Transcription and Audio Source Separation
2 In this article, we give an overview of a range of approaches to the analysis and separation of musical audio. In particular, we consider the problems of automatic music transcription and audio source separation, which are of particular interest to our group. Monophonic music transcription, where a single note is present at one time, can be tackled using an autocorrelation-based method. For p...
متن کاملAutomatic Music Transcription using Audio-Visual Fusion for Violin Practice in Home Environment
Violin practice in a home environment, where there is often no teacher available, can benefit from automatic music transcription to provide feedback to the student. This paper describes a high performance violin transcription system with three main contributions. First, as onset detection is an important but challenging task for automatic transcription of pitched non-percussive music, such as f...
متن کاملLyrics-Based Audio Retrieval and Multimodal Navigation in Music Collections
Modern digital music libraries contain textual, visual, and audio data describing music on various semantic levels. Exploiting the availability of different semantically interrelated representations for a piece of music, this paper presents a query-by-lyrics retrieval system that facilitates multimodal navigation in CD audio collections. In particular, we introduce an automated method to time a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Expert Systems With Applications
سال: 2023
ISSN: ['1873-6793', '0957-4174']
DOI: https://doi.org/10.1016/j.eswa.2022.119491